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Abstract 

Wong's diffusion network is a stochastic, zero-input Hopfield network 
[8] with a Gibbs stationary distribution over a bounded, connected contin- 
uum [13] . Previously, logarithmic thermal annealing was demonstrated for 
the diffusion network [101 [T] and digital versions of it were studied and ap- 
plied to imaging [14] , Recently, "quantum" annealed Markov chains have 
garnered significant attention [121 [3] because of their improved perfor- 
mance over "pure" thermal annealing. In this note, a joint quantum and 
thermal version of Wong's diffusion network is described and its conver- 
gence properties are studied. Different choices for "auxiliary" functions 
are discussed, including those of the kinetic type previously associated 
with quantum annealing. 

1 Introduction 

The optimization of a function V(x), xeD, when the dimension of the space D 
is large and multiple local minima exist, is a computationally difficult problem. 
A class of stochastic algorithms, known as simulated annealing, has been devel- 
oped for the case where D is countable [7j- For optimization over a bounded 
continuum, a diffusion network was proposed in [13j and its thermal annealing 
properties were established in [10] after [7]. In this note, we study a quantum 
version of this system that, unlike thermal annealing, modify the objective func- 
tion V in a nonlinear, nonuniform way. Quantum annealing proposals in the 
past include those involving the Shrodinger operator with potential V [T], and 
those that add an auxiliary function to V that depends on VV {e.g., the Ising 
spin glass model with an external field [3]). We consider here the latter type. 
Generally, the intuition behind the use of an auxiliary function is to initially 
perform a greater breadth of search than under pure thermal annealing search. 
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2 Specification of a Quantum Diffusion Machine 

Consider a time-inhomogeneous system described by 

du(t) = -V[V(x(t)) -T(t)V(x(t))]dt + T.(T(t),x(t))dW(t) 
x(t) = G(u(t)) (1) 

where the first equation is a stochastic differential equation of the ltd type, 

V is gradient with respect to the x variables, 

u(t) € R" where u{t) = (u 1 ^), u n {t)), n > 1, 

x(t) e(-l,l) n , x(t) = (x 1 (t),...,x n (t)), 

W(t) is 7i-dimensional Brownian motion, 

G : R™ -> (-l,l) w , 

V,V: [-1,1]" — > R, V,V S C 2 , 

M := supae^i^jn T^(x) - inf a , e[ _ 1) i]n V(x) < oo, 

M := su Pa . e[ _ M] „ V(x) - inf a e[-i,i] n < 00 > 

T > the deterministic thermal/temperature process, and 

r > the deterministic quantum parameter process. 

G is such that x\ = g(u k ) where g is a sigmoid threshold function commonly 
found in neural networks: 

g(u k (t)) = tanh(u fc (i)/w) with w > 0. 

If V" = and £ = then the two relations in ([T]) describe a continuous- 
time Hopfield network with Lyapunov function V and no external inputs (easily 
realized as a "neural" network when V is quadratic). If V = and 



S(T(i),^)=diagi 

where 



2T(t) / 2T(t) 



/(*W"'v /(*"(*)) 



f(y) = g'(g- 1 (y)) = ^(l-y 2 ) 

and T > is constant, then the stationary distribution of the x process is Gibbs 

H(x) := ±ex P (-i/(x)/T) (2) 

where Z is the partition (normalization) function. This is immediately seen by 
applying Ito's rule to ([1]), after which the Fokker-Planck operator [9] governing 
the distribution p of the x process is seen to be: 

L T (p) = div[A{TVp + pV(V -TV))] (3) 

where 

A(x) = diag(f(x 1 ),...,f(x n )). 
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That is, L (m) = 0. Furthermore, if T{t) = 7 7 (0)/log 2 (2 + t) (logarithmic 
thermal cooling), T(0) > 2M, and the global extrema of V are assumed in the 
interior (0, 1)™, then time-inhomogeneous process Xt converges in probability to 
the (ground state) set that globally minimizes the objective function V [7]. 
If fixed T, r > 0, then the invariant distribution is clearly 

Hr(x) := ±exp(-(V(x)-rV(x))/T). (4) 

So, if r = o(T), /^r is like a Gibbs distribution in the sense that it tends to 
indicate the globally minimizing (ground) states of V as T — > oo. 



3 Quantum convergence to the Gibbs invariant 

In [12] (and as explained in the recent survey [3]), a quantum annealing process is 
considered. They show that a faster-than-logarithmic quantum cooling schedule, 
r(t) l as t — ► oo, can be used to establish convergence to the Gibbs invariant 
for fixed T > 0, i.e., not to the ground states. We now prove the analogous 
result for the diffusion network, subject to a more rapid cooling schedule, by 
adapting the thermal convergence proof in [7j [10] . To this end, we show how the 
distribution m t of Xt "tracks" the distribution Mr(t) (note that this is obvious 
for all sufficiently large t if V reaches zero in finite time) . As the proof is a more 
substantive variation of [7] than for pure thermal annealing of the diffusion 
network, we give it in greater detail here than we did in |10j . 
We begin by defining 

f m t( x ) , 
7(0,1)" Mr(t)(z) 

where 

ra t = Lr(t)m t . 
Let 7r be the gap between and the rest of the spectrum of Lr [2 : 

7r * JJ(cl>(x)-<P(y))^r(x)fir(y)dydx [ > 

subject to the constraint that <\> is not constant, where integration is over (0, 1)™. 
Equivalently, 

7 (T) = inf T J {V(t>) T A(\7(j))ndx 
subject to J 4> 2 ^t dec = 1 and / 4>nr dx = 0. 

Theorem 3.1 For any nonincreasing, differentiable quantum schedule T with 
F(oo) = and any constant temperature T > 0: 



z t 




_ Eft) 

2T 7 (T(t)) 
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Proof: Take 

4>t = m t /lJ, r{t y 

By direct differentiation, 

z t = 2 J (j) t m t dx - J t 2 /i t dx 

= 2 J 4> t L t m t Ax - T-^it) J {V- < V dx 

< 2jcj H L t ( ( j H t M )dx-T- 1 t{t)Mz t 

= -2T I "(V&)' 'A(V<j> t )v t dx -T- l t{t)Mz u 

where the last step is integration by parts using A(0) = = A(l). Thus, by the 
previous expression for j(T(t)) (noting J (<p t — l)Mt da; = 0), 

z t < l(T(t)) J( ( j> t -\f l i t 6x-T- 1 t(t)Mzt, 

= 7 (r(t))(« t -l)-T- 1 f(t)M^ ) 

= (-2 7 (r(i)) - T^f (t)Af) zt vt>o. 

Integrating in time, we get an inequality of the form z t < a t + J (3 s z s ds where 
a t := z + J Q 27(r(s)) ds and 

dt := -2 7 (r(t))-r- 1 r(t)M. 

So by applying Gronwall's lemma and then multiplying by 1 = cxp(— J Q f3 r dr) / exp(— J* (3 r dr), 
we get 



z t < a t + J a s [3 s exp( J (3 r dr) ds 

a t exp(- / Q * p s ds) + f* a s d exp(- f£ (3 r dr) 

ex P(- Jo Pr dr ) 
z + / t 2 7 (r( S ))exp(-/ s /3 r dr) ds 
JoPrdr) 

where the last step is integration by parts (resulting in term cancellation in the 
numerator) and the fact that a — z and a t = 2 7t . Now note that as t — > oo, 
7 (r(i)) -> 7 (r(oo)) := 7(0) > and t(t) -> 0, and therefore /3 t -> -2 7 (0) < 0. 
Thus, the numerator and denominator of the previous display both diverge as 
t — > oo. Applying L'Hopital's rule gives that 

2 7 (rm) 

Zt < - — as t — > oo. 

-Pt 
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□ 



Lemma 3.1 3 c > 0, which does not depend on T or T. such that 
7(T) > cTexp(-2M*(T)/T) 

where 

M*(r) := sup(V - TV) - inf(y - ry). (6) 

Proof: By ©, 7(r) > cThu>r/(sup/x r ) 2 where 

2/(V0) T A(V0) dx 



inf ■ 



□ 



* I f(<P{x)-<t>(y)) 2 dydx' 

So, 7 (T) > 

cTZ r exp(-sup(F - rV")/T)/exp(-2inf(V - IV) /T) = 
cTexp(-M*(r)/T) y exp(-[(F-]V) - inf V - IV)]/T) da. 

Completing our adaptation of the arguments in [10] : 

Corollary 3.1 For any nonincreasing, differentiable quantum schedule T with 
r(oo) = and any constant temperature T > 0, there is a constant K < oo such 
that for any S C (0, 1)", 

P(x t £S) < k(J fi m (x)6x\ Vt>0. 



Proof: Let 

fl(r,f,t) := ^l + ^f(t)exp(2M*(r(t))/T)| . (7) 
By the previous lemma and theorem, 







lim z t < limB(r,r,t) = 1. (8) 

t — >OG £ — >OG 

Thus, by the continuity of z t , there exists a positive constant K < oo such that 
zt < K 2 for all t > 0. So, by the Cauchy-Schwarz inequality, 

P(x t S 5) = J l s m t dx = J l s 0tMr(t) da; 



s 
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Substituting z t < K 2 completes the proof. 



□ 



Note that K will depend on the parameter zq. 



4 Global optimization of joint annealing 

To interpret this result, note that as t — > oo, /ir(t) defined in ([4]) is tending to the 
Gibbs distribution © for fixed T > 0. Therefore, if T > is small and S does 
not include the ground states of V {e.g., S={i6 (0, 1)™ | V(x) >9 + infV} 
for some sufficiently large 8 > 0), then P(xt G S) will be small. To sharpen this 
statement, consider joint quantum and thermal annealing. 

Theorem 4.1 If A(t) := T(t)/T(t) -> 0, i.e., T = o(T), and D(t) := l/T(t) = 
log 2 (2 + t)/T(0) luiift T(0) > 2M, then 3K* < oo such that 

P(x t eS) < K* (^J fi t (x) dx^j Vi>0, 
where fit is given by $S\j with T = T(t). 



Proof: Argue as for © that 

It > -^exp(-2D(t)M*(T(t))), 

and so conclude as in the previous corollary, where the condition T(0) > 2M = 
2M*(0) figures in the resulting exponent of (1 + 1) after substituting for D. □ 

So, if S does not contain any of the ground states of V , then lim^oo P(xt G 
S) -► 0. 



5 Discussion: Choices for auxiliary function 
5.1 Homotopy methods 

In "homotopy" based search [4], the auxiliary function is taken to be 

V := V - V 

where T(0) w 1 and Vo is unimodal (only one local minimum which is, of course, 
its global minimum) . Therefore, the ground states of V— T(t) V are quickly found 
initially (i.e., when t > is small so that V — F(t)V w Vo). Ideally, V) is the 
best such function approximating V if suitable "global" information about V 
is available to determine it a priori; in this case, the initial ground states (of 
V — T(t)V for small t > 0) are close to those of objective function V. 
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5.2 Contracting the objective function 

Suppose that the auxiliary function is simply 

V := V 

and that T(0) < 1. In this case, the quantum diffusion network is performing 
a kind of thermal annealing from temperature T/(l — T(0)) down to T > 0. 
So, this choice of auxiliary function has the effect of linearly contracting the 
objective function V, as in "pure" thermal annealing, thereby facilitating a 
greater breadth of search initially. 

An example nonlinear contraction of the objective function V is obtained by 
using the auxiliary function 

V(x) := -e T V 2 V(x)e, ie(-l,l) n , (9) 

where e is a fixed n-vector. Note that V(x) > 0, respectively V(x) < 0, when x 
is a local maximum, respectively minimum, of V . 

The use of the auxiliary ([9]) may not result in significant contraction of 
the objective function (i.e., from V to V — TV) in situations where the peaks 
or valleys of V are very deep. In a one dimensional (n — 1) setting, we can 
deal with this problem in the case where there is a local extremum (V = 0) 
between successive points at which V" = (no saddle points in particular) by 
augmenting this auxiliary using "kinetic" components (i.e., involving V') which 
are typically associated with "quantum" annealing, e.g., 

V := -(e 2 + \V\ 2 )V". (10) 

This example has a natural multidimensional form: V := — e T V 2 Ve— (W) T V 2 V W. 
In the case where it is advantageous to further contract the objective V at the 
points where V" = (V and V — V for "quantum" auxiliary V of (JTUJ) are equal 
at these points), one can similarly propose to augment the auxiliary function 
with -(e 2 + \V"\ 2 )V", etc. 
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